Probabilistic Inference in Machine Learning. MSc in Data Science Methodology. Barcelona School of Economics.

Supervisor: Prof. Lorenzo Cappello, PhD

Importing the relevant libraries:

library(gridExtra)
library(rstanarm)
Loading required package: Rcpp
This is rstanarm version 2.32.1
- See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
- Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
  options(mc.cores = parallel::detectCores())
library(arm)
Loading required package: MASS
Loading required package: Matrix
Loading required package: lme4

arm (Version 1.14-4, built: 2024-4-1)
Working directory is /home/oliver/Documents/Term 2/Propabilistic Inference/Bayesian_Algorithm

Attaching package: 'arm'
The following objects are masked from 'package:rstanarm':

    invlogit, logit
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following object is masked from 'package:MASS':

    select
The following object is masked from 'package:gridExtra':

    combine
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(rstan)
Loading required package: StanHeaders

rstan version 2.32.6 (Stan version 2.32.2)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)

Attaching package: 'rstan'
The following object is masked from 'package:arm':

    traceplot
library(bayesplot)
This is bayesplot version 1.11.1
- Online documentation and vignettes at mc-stan.org/bayesplot
- bayesplot theme set to bayesplot::theme_default()
   * Does _not_ affect other ggplot2 plots
   * See ?bayesplot_theme_set for details on theme setting

We set options for rstan to use multiple cores for faster computation:

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Clear the environment:

rm(list=ls())

Abstract

1. Introduction

In this project, we perform a hierarchical Bayesian analysis of the radon dataset. Our objective is to explore the radon levels in houses across various counties and understand how the levels vary both within and between counties. Inspired by Gelman and Hill’s work on hierarchical models, we build on this foundation by implementing an enhanced Hamiltonian Monte Carlo approach in Stan. This method is expected to provide efficient and robust posterior inference for our multilevel model.

2. Exploring the Data

In this section, we explore the radon dataset to understand its structure and key variables.

Importing and Inspecting the Data

First, we load the radon dataset. We can do this by using the load method as the dataset is included in the rstan package. Alternatively, we added the file as a CVS in the Project.zip:

data("radon") 
#radon<-read.csv("radon_exported.csv") if the above gives errors due to package incompatibility. 

Let’s inspect the first few rows:

head(radon)
  floor county  log_radon log_uranium
1     1 AITKIN 0.83290912  -0.6890476
2     0 AITKIN 0.83290912  -0.6890476
3     0 AITKIN 1.09861229  -0.6890476
4     0 AITKIN 0.09531018  -0.6890476
5     0  ANOKA 1.16315081  -0.8473129
6     0  ANOKA 0.95551145  -0.8473129

The dataset contains the following variables:

  • \(floor\): Indicates whether the measurement was taken in a basement (coded as 0) or on an upper floor (coded as 1).

  • \(county\): Identifies the county for each observation.

  • \(log_radon\): The natural logarithm of the radon measurement. This transformation is applied to stabilize variance and approach normality in the distribution of radon levels.

  • \(log_uranium\): The natural logarithm of the uranium measurement. Uranium concentration in the soil is considered a potential predictor of radon levels, as radon is a decay product of uranium.

We also check the dimensions of the dataset:

dim(radon)
[1] 919   4

This dataset consists of 919 observations.

Distribution of Radon Levels

To understand the distribution of radon levels, we visualize the histogram of the log-transformed radon measurements:

ggplot(radon, aes(x = log_radon)) + 
  geom_histogram(bins = 30, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Log Radon Levels", x = "Log Radon", y = "Frequency")

The distribution of log-transformed radon levels is roughly unimodal, centered around 1 to 2 on the log scale, and shows a moderate right tail. The log transformation reduces skew compared to raw radon measurements and helps stabilize variance, making it more amenable to hierarchical modeling.

County-Level Differences

Next, we explore how radon levels vary by county. A boxplot of log_radon across counties provides a clear visualization of the differences:

ggplot(radon, aes(x = factor(county), y = log_radon)) + 
  geom_boxplot(fill = "lightgreen", color = "black") +
  labs(title = "Radon Levels by County", x = "County", y = "Log Radon") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

There is clear variation in log radon levels across counties, with some counties showing higher or lower median values than others. Within each county, there is also notable spread in the data. This suggests both between-county and within-county variability and reinforces the need for a hierarchical modeling approach that can capture these different levels of variation.

Additionally, we also compute and plot the mean log radon level for each county:

county_summary <- radon %>%
  group_by(county) %>%
  summarise(mean_log_radon = mean(log_radon, na.rm = TRUE),
            n = n())

ggplot(county_summary, aes(x = reorder(county, mean_log_radon), y = mean_log_radon)) +
  geom_point() +
  coord_flip() +
  scale_x_discrete(expand = expansion(mult = c(0.05, 0.05))) +
  labs(title = "Mean Log Radon Levels by County",
       x = "County",
       y = "Mean Log Radon") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 4))

Again, this shows that average radon levels vary noticeably across counties, with some counties having much lower mean log radon than others. The clear gradient from left to right indicates substantial between-county variation, which supports using a hierarchical model to capture these differences alongside the within-county variability.

Exploring Relevant Predictors

An important predictor in our dataset is the \(floor\) variable, which indicates if the measurement is from a basement (0) or an upper floor (1). We examine how radon levels differ based on this predictor:

ggplot(radon, aes(x = factor(floor), y = log_radon)) + 
  geom_boxplot(fill = "orange", color = "black") +
  labs(title = "Radon Levels by Floor (Basement vs Upper Floor)", 
       x = "Floor (0 = Basement, 1 = Upper Floor)", y = "Log Radon")

We can see that homes measured in the basement tend to have higher radon levels than those measured on an upper floor. Although there is overlap between the two groups, the boxplot shows a higher median and upper quartile for basement measurements, which aligns with the idea that radon gas often enters through the ground and accumulates more in lower levels of a building.

Note: Additionally, including log_uranium as a predictor in the model allows us to account for potential covariate effects that might explain some of the variability in radon levels.

Identifying the Hierarchical Structure

The radon data are naturally grouped by county, meaning that multiple observations (houses) are nested within each county. This leads to two sources of variability:

  • Within-County Variability: Variation in radon levels among houses within the same county.

  • Between-County Variability: Variation in the average radon level from one county to another.

To further illustrate this structure, we summarize key statistics by county:

county_stats <- radon %>%
  group_by(county) %>%
  summarise(n = n(), 
            mean_log_radon = mean(log_radon, na.rm = TRUE), 
            sd_log_radon = sd(log_radon, na.rm = TRUE))
county_stats
# A tibble: 85 × 4
   county        n mean_log_radon sd_log_radon
   <fct>     <int>          <dbl>        <dbl>
 1 AITKIN        4          0.715        0.432
 2 ANOKA        52          0.891        0.718
 3 BECKER        3          1.09         0.717
 4 BELTRAMI      7          1.19         0.894
 5 BENTON        4          1.28         0.415
 6 BIGSTONE      3          1.54         0.504
 7 BLUEEARTH    14          1.93         0.542
 8 BROWN         4          1.65         0.595
 9 CARLTON      10          0.977        0.585
10 CARVER        6          1.22         1.90 
# ℹ 75 more rows

The table shows that each of the 85 counties has a different number of observations (n) and distinct mean and standard deviation of log radon levels. Some counties have very few data points (e.g., 3 or 4), while others have many more (e.g., 52). The mean log radon values also vary considerably across counties, as do their standard deviations. This variation in both sample size and radon levels by county underscores the need for a hierarchical model that can account for county-to-county differences (random effects) while borrowing strength across counties.

3. Model Specifications

Baseline Model: Hierarchical Model Formulation

We aim to model the log-transformed radon level \(y_{ij}\) for house \(i\) in county \(j\). We include a fixed effect for whether the measurement was taken in the basement (0) or on an upper floor (1). In addition, each county has its own intercept, reflecting the idea that some counties may have generally higher or lower radon levels. Note that this model is inspired by the hierarchical models presented by Gelman and Hill (2007). We can write this as:

\[ \begin{aligned} y_{ij} &\sim \mathcal{N}(\alpha_j + \beta \, x_{ij}, \sigma^2), \\[6pt] \alpha_j &\sim \mathcal{N}(\mu_\alpha, \tau^2), \end{aligned} \] where:

  • \(y_{ij}\) is the log radon measurement for house \(i\) in county \(j\).

  • \(x_{ij}\) is the floor variable (0 = basement, 1 = upper floor).

  • \(\alpha_j\) is the county-specific intercept, which follows a normal distribution centered at \(\mu_\alpha\) with standard deviation \(\tau\).

  • \(\beta\) is the fixed effect for the floor predictor.

  • \(\sigma\) is the within-county (house-level) residual standard deviation.

  • \(\mu_\alpha\) is the overall (global) intercept across all counties.

  • \(\tau\) captures the between-county variability.

To specify our model, we used the following distributions for the parameters:

\[ \begin{aligned} \mu_\alpha &\sim \mathcal{N}(0, 10^2), \quad \tau &\sim \text{Cauchy}^+(0, 2.5), \quad \beta &\sim \mathcal{N}(0, 10^2), \quad \sigma \sim \text{Cauchy}^+(0, 2.5). \end{aligned} \]

We shall denote this model our baseline model. The corresponding plate diagram can be seen in the file baseline_model.jpeg.

Goal of Bayesian Inference

The goal of this Bayesian hierarchical model is to estimate the posterior distributions of all unknown parameters:

  • \(\beta\) (effect of basement vs. upper floor),

  • \(\mu_\alpha\) (overall average intercept),

  • \(\tau\) (between-county variability),

  • \(\sigma\) (within-county variability), and

  • each county-specific intercept \(\alpha_j\).

By estimating these posterior distributions, we can quantify both the uncertainty at the county level (how counties differ from one another) and the uncertainty at the house level (residual variability around each county’s intercept).

We will implement our baseline model in the chapter/code below. The corresponding model can be found in the file radon_model_centered.stan We load our Stan model (saved as radon_model_centered.stan), prepare the data for modeling, run the posterior sampler using Hamiltonian Monte Carlo (HMC) via the No-U-Turn Sampler (NUTS), and perform several diagnostic checks. NUTS is particularly effective for hierarchical models, as it uses gradient information to efficiently explore high-dimensional posterior distributions.

Data Preparation for Stan

We first prepare the data in a list that matches the format expected by the Stan model. Note that we convert the county variable to an integer to serve as an index for the hierarchical structure.

radon_data <- list(
  N = nrow(radon),
  J = length(unique(radon$county)),
  county = as.integer(radon$county),
  y = radon$log_radon,
  x = radon$floor
)
#print(radon_data)

Running the Stan Sampler

We run our model using the No-U-Turn Sampler (NUTS), an advanced variant of Hamiltonian Monte Carlo (HMC). NUTS leverages gradient information from the posterior to efficiently navigate high-dimensional parameter spaces. It adaptively determines the optimal path length (i.e., the number of leapfrog steps), thereby avoiding unnecessary computations and manual tuning. This automatic adaptation helps ensure that the sampler explores the posterior distribution thoroughly, leading to better convergence and improved effective sample sizes. In our analysis, we begin with a baseline setup and aim to further refine the sampler in the subsequent parts.

We now run our Stan model using 4 chains with 2000 iterations each (1000 warmup iterations and 1000 sampling iterations):

fit <- stan(file = "radon_model_centered.stan",
            data = radon_data,
            iter = 2000,
            warmup = 1000,
            chains = 4,
            seed = 123)

Summary of the posterior distributions for the key parameters:

print(fit, pars = c("beta", "mu_alpha", "tau", "sigma"), probs = c(0.025, 0.5, 0.975))
Inference for Stan model: anon_model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd  2.5%   50% 97.5% n_eff Rhat
beta     -0.66       0 0.07 -0.79 -0.66 -0.53  4247    1
mu_alpha  1.49       0 0.05  1.39  1.49  1.59  3114    1
tau       0.32       0 0.05  0.24  0.32  0.42  1365    1
sigma     0.73       0 0.02  0.69  0.73  0.76  6424    1

Samples were drawn using NUTS(diag_e) at Mon Mar 24 15:43:50 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Parameter Estimates:

  • \(\beta\): The fixed effect for the floor variable has a mean of -0.66, with a 95% credible interval from -0.79 to -0.66. This negative value suggests that, all else being equal, measurements taken on an upper floor (assuming floor is coded as 1 for upper floor) are associated with lower log radon levels compared to basement measurements.

  • \(\mu_\alpha\): The overall (global) intercept is estimated at 1.49, with a credible interval from 1.39 to 1.60. This represents the average log radon level for the baseline (reference) group.

  • \(\tau\): The between-county standard deviation is estimated at 0.32 (credible interval: 0.24 to 0.42), quantifying the variability in county-specific intercepts.

  • \(\sigma\): The residual standard deviation (within-county variability) is estimated at 0.73 (credible interval: 0.69 to 0.76).

Diagnostic Checks

After sampling, it is important to check that the chains have converged and that the posterior exploration is adequate. We use several diagnostic tools available in the bayesplot package.

Trace Plots

Trace plots provide to us a visual representation of the sampling paths across chains. We expect that well-mixed chains should appear as overlapping, horizontal “fuzzy caterpillars.”

We convert the fitted model to an array for bayesplot and investigate the traces:

posterior_samples <- as.array(fit)
mcmc_trace(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

From these trace plots, we see that all four chains appear to have converged and are mixing well. Each chain fluctuates around a similar central value for each parameter, and we observe no pronounced upward or downward trends. The overlap among chains is substantial, suggesting that we have reached a stable sampling distribution. We also see no abrupt jumps or signs of poor mixing, which reinforces that our NUTS sampler is exploring the posterior efficiently.

Pair Plots

Pair plots help us assess potential correlations between parameters and they also provide an overview of the joint posterior distributions.

mcmc_pairs(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

Marginal Distributions (Diagonals):

Each parameter’s histogram is roughly unimodal and does not exhibit strong skew or multiple modes. This suggests that the posterior for each parameter is relatively stable and well-identified.

Pairwise Relationships (Off-Diagonals):

The scatter plots show how each parameter’s posterior samples relate to the others. We do not see strong linear or nonlinear patterns, indicating that no pair of parameters is highly correlated. This relative lack of strong correlation is a good sign for efficient sampling and clear parameter interpretation.

Overall Posterior Behavior:

The tight, roughly elliptical scatter clouds in the off-diagonal plots, combined with unimodal marginals, suggest that our model converged well and that the parameters are well estimated. There is no visual evidence of pathological behavior such as multi-modality or heavy tail correlations that could impede inference.

Convergence Statistics

In this subsection, we demonstrate several methods we have learned for assessing convergence of our Markov chains. By confirming that our sampler has stabilized around the same region across chains, exhibits minimal autocorrelation, and has sufficiently large effective sample sizes, we gain confidence that the posterior draws are reliable.

Multiple Chains and R-hat

We ran multiple Markov chains in parallel. When each chain converges to the same region of parameter space, it provides evidence that the sampler is exploring the true posterior rather than getting stuck in a local mode. In Stan’s output, the \(\hat{R}\) statistic quantifies the potential scale reduction factor.

rhat_values <- rhat(fit, pars = c("beta", "mu_alpha", "tau", "sigma"))
print(rhat_values)
     beta  mu_alpha       tau     sigma 
0.9992642 0.9998036 1.0009017 0.9993727 

An \(\hat{R}\) value of about 1 indicates perfect convergence, meaning that the variability between the chains is nearly identical to the variability within each chain. Typically, values below 1.1 are considered acceptable, so our values confirm that our chains have converged well. This gives us confidence that the posterior estimates are reliable and that our sampler has thoroughly explored the target distribution.

Effective Sample Size (ESS)

The effective sample size (ESS) measures how many nearly independent draws we have relative to the total samples. High autocorrelation within a chain reduces the ESS. Stan reports both an absolute ESS and an ESS ratio. A higher ESS indicates more information for reliable posterior estimation.

ess_values <- neff_ratio(fit, pars = c("beta", "mu_alpha", "tau", "sigma"))
print(ess_values)
     beta  mu_alpha       tau     sigma 
1.0617332 0.7784023 0.3412737 1.6060289 

The ESS ratio for \(\beta\) (1.06) and \(\sigma\) (1.60) indicate that these parameters are sampled with high efficiency — i.e., the chains for these parameters have relatively low autocorrelation. On the other hand, \(\mu_\alpha\) (0.78) and \(\tau\) (0.34) have lower ESS ratios, suggesting that their chains exhibit higher autocorrelation and thus provide fewer independent draws. Although these lower ratios might still be acceptable, they indicate that further tuning or additional iterations could be beneficial to improve the precision of our estimates for these parameters.

Autocorrelation

Even if the chains converge, strong autocorrelation in the draws can reduce the effective sample size. We can visualize the autocorrelation function (ACF) for each parameter using bayesplot.

posterior_samples <- as.array(fit)
mcmc_acf_bar(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

CHANGE INTERPRETATION HERE – We see that \(\beta\) and \(\sigma\) exhibit a rapid decay toward zero, indicating that successive draws for these parameters are relatively uncorrelated. On the other hand, \(\mu_\alpha\) and \(\tau\) show a slower decay, suggesting higher autocorrelation. This finding aligns with the lower ESS ratios observed for these parameters. While the chains do converge overall, the slower decay in autocorrelation for \(\mu_\alpha\) and \(\tau\) implies that we might need additional iterations — or a different parameterization — to achieve a higher number of effectively independent draws for those parameters.

Density Plots

Density plots help visualize the marginal posterior distributions of each parameter. By overlaying densities from each chain, we can quickly assess whether the chains have converged to the same distribution and identify any irregularities such as multimodality or heavy tails.

posterior_samples <- as.array(fit)
mcmc_dens_overlay(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

These overlaid density plots show that all four chains produce nearly identical posterior distributions for each parameter, indicating strong convergence and no evidence of multi-modality. Each parameter’s distribution appears unimodal, and the curves from different chains overlap closely. This alignment suggests that the sampler has converged to a stable region of parameter space and that our posterior estimates for \(\beta\), \(\mu_\alpha\), \(\tau\), and \(\sigma\) are both consistent and reliable.

5. Algorithm Enhancement

5.1 Non-Centered and Centered Parametrization

Our first improvement of this model is based on Papaspiliopoulos, Roberts, and Sköld (2003). According to Papaspiliopoulos, Roberts, and Sköld (2003), non-centered parameterization can improve sampling efficiency in hierarchical models. Instead of sampling \(\alpha_j\) directly, we introduce a latent variable \(\alpha_{\text{raw}, j}\) that follows a standard normal, and then transform it:

\[\alpha_j = \mu_\alpha + \alpha_{\text{raw}, j} \cdot \tau.\]

To recap, In the centered parameterization on the other hand, we model the county-level intercepts \(\alpha_j\) directly as being drawn from a common distribution with parameters \(\mu_\alpha\) and \(\tau\).

This model assumes that each county-specific intercept \(\alpha_j\) is drawn from a population distribution centered at \(\mu_\alpha\) with standard deviation \(\tau\), allowing for partial pooling of information across counties. The slope \(\beta\) represents the effect of floor level, and \(\sigma\) is the residual variability within counties.

We call this the centered parameterization because we sample \(\alpha_j\) directly from its normal distribution:

\[ \alpha_j \sim \mathcal{N}(\mu_\alpha, \tau^2), \]

as opposed to defining a standard normal variable and transforming it (as in the non-centered case).

In the non-centered parameterization, we instead define:

\[ \alpha_j = \mu_\alpha + \tau \cdot \alpha_{\text{raw},j}, \quad \text{with } \alpha_{\text{raw},j} \sim \mathcal{N}(0, 1), \]

Theoretically, this reparameterization can improve convergence and sampling efficiency, especially when the data are weakly informative about group-level variation (e.g., sparse counties or small \(\tau\)). In contrast, the centered parameterization tends to work better when group-level effects are well-identified from the data (e.g., larger sample size per group, stronger signals).

In the next section, we implement the centered model, in such a way that we can compare it to the non-centered version.

5.2 Fitting the non-centered: A comparison

As done before, we load our Stan model (saved as radon_model_non-centered.stan), prepare the data for modeling, run the posterior sampler using Hamiltonian Monte Carlo (HMC) via the No-U-Turn Sampler (NUTS), and perform several diagnostic checks. For simplicity and clarity, we decided to have the whole process of fitting the model withing the next code chunk.

We now run our Stan model using 4 chains with 2000 iterations each (1000 warmup iterations and 1000 sampling iterations):

fit_nc <- stan(file = "radon_model_non-centered.stan",
            data = radon_data,
            iter = 2000,
            warmup = 1000,
            chains = 4,
            seed = 123)

print(fit_nc, pars = c("beta", "mu_alpha", "tau", "sigma"), probs = c(0.025, 0.5, 0.975))

Parameter Estimates:

Overall, the posterior estimates for the key parameters are very similar between the centered and non-centered models. This reflects the fact that both parameterizations describe the same underlying model. However, differences could still arise in the efficiency of the sampling process, which we examine in the next diagnostic checks.

Diagnostics

Just like before, we can see how the chains have converged, very similiar to before. The trace plots indicate good mixing and convergence, with no divergent transitions or drift. This assessment is consistent with our results above, indicating that both models reach the same posterior distribution.

posterior_samples_nc <- as.array(fit_nc)
mcmc_trace(posterior_samples_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))

mcmc_trace(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

mcmc_pairs(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

mcmc_pairs(posterior_samples_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))

convergence

As part of our convergence diagnostics, we examined both the potential scale reduction factor rhat and the effective sample size (ESS).

Just like in the basemodel, we will analyze the values. We can obviously see some differences here.

rhat_values_nc <- rhat(fit_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))
print(rhat_values_nc)
     beta  mu_alpha       tau     sigma 
0.9997330 1.0011367 1.0027120 0.9994411 
print(rhat_values)
     beta  mu_alpha       tau     sigma 
0.9992642 0.9998036 1.0009017 0.9993727 

Unlike before, we can see that not only \(\beta\) but also all of the other parameters have a value close to 1. This suggests stronger convergence across chains, particularly for the group-level variance components, which are known to suffer from poor mixing in the centered parameterization. We conclude that the non-centered version samples more efficiently and converges more robustly in this setting. Overall, this provides support for the findings findings of Papaspiliopoulos, Roberts, and Sköld (2003), confirming that non-centered parameterization can enhance sampling efficiency in hierarchical models

ess_values_nc <- neff_ratio(fit_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))
print(ess_values_nc)
     beta  mu_alpha       tau     sigma 
1.8743437 0.5729788 0.3968207 1.8162346 
print(ess_values)
     beta  mu_alpha       tau     sigma 
1.0617332 0.7784023 0.3412737 1.6060289 

The effective sample size (ESS) ratios are very similar across the centered and non-centered models. In particular, the values for the hierarchical parameters \(\mu_\alpha\) and \(\tau\) remain nearly unchanged, suggesting that both parameterizations mix comparably well. This indicates that our dataset is sufficiently informative to support efficient sampling even under the centered form.

Even if the chains converge, strong autocorrelation in the draws can reduce the effective sample size. We can visualize the autocorrelation function (ACF) for each parameter using bayesplot.

mcmc_acf_bar(posterior_samples_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))

mcmc_acf_bar(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

In general, we can observe very similiar plots to before. Looking closely, what might be noticable is that we see slightly more persistent autocorrelation in mu_alpha and especially tau

Density Plots

Density plots help visualize the marginal posterior distributions of each parameter. By overlaying densities from each chain, we can quickly assess whether the chains have converged to the same distribution and identify any irregularities such as multimodality or heavy tails.

par(c(1,2))
Warning in par(c(1, 2)): argument 1 does not name a graphical parameter
NULL
mcmc_dens_overlay(posterior_samples_nc, pars = c("beta", "mu_alpha", "tau", "sigma"))


mcmc_dens_overlay(posterior_samples, pars = c("beta", "mu_alpha", "tau", "sigma"))

Just like before when we estimated the parameters, the overlaid density plots show highly similar posterior distributions across both centered and non-centered parameterizations, indicating that both models estimate the same underlying parameters. This is expected, as both formulations represent the same statistical model # 5.3 Enhancment by changing Priors

Changing prios and so on.

Continue here by changing the model/enhancing it in a different way.

6. Discussion

7. Conclusion

References

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

Papaspiliopoulos, O., Roberts, G. O., & Sköld, M. (2003). Non-centered parameterizations for hierarchical models and data augmentation. Bayesian Statistics, 7, 307–326.